skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Wang, Sijian"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract There are continuous efforts to elucidate the structure and biological functions of short hydrogen bonds (SHBs), whose donor and acceptor heteroatoms reside more than 0.3 Å closer than the sum of their van der Waals radii. In this work, we evaluate 1070 atomic-resolution protein structures and characterize the common chemical features of SHBs formed between the side chains of amino acids and small molecule ligands. We then develop a machine learning assisted prediction of protein-ligand SHBs (MAPSHB-Ligand) model and reveal that the types of amino acids and ligand functional groups as well as the sequence of neighboring residues are essential factors that determine the class of protein-ligand hydrogen bonds. The MAPSHB-Ligand model and its implementation on our web server enable the effective identification of protein-ligand SHBs in proteins, which will facilitate the design of biomolecules and ligands that exploit these close contacts for enhanced functions. 
    more » « less
  2. Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease–substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins. 
    more » « less
  3. Abstract Short hydrogen bonds (SHBs), whose donor and acceptor heteroatoms lie within 2.7 Å, exhibit prominent quantum mechanical characters and are connected to a wide range of essential biomolecular processes. However, exact determination of the geometry and functional roles of SHBs requires a protein to be at atomic resolution. In this work, we analyze 1260 high-resolution peptide and protein structures from the Protein Data Bank and develop a boosting based machine learning model to predict the formation of SHBs between amino acids. This model, which we name as machine learning assisted prediction of short hydrogen bonds (MAPSHB), takes into account 21 structural, chemical and sequence features and their interaction effects and effectively categorizes each hydrogen bond in a protein to a short or normal hydrogen bond. The MAPSHB model reveals that the type of the donor amino acid plays a major role in determining the class of a hydrogen bond and that the side chain Tyr-Asp pair demonstrates a significant probability of forming a SHB. Combining electronic structure calculations and energy decomposition analysis, we elucidate how the interplay of competing intermolecular interactions stabilizes the Tyr-Asp SHBs more than other commonly observed combinations of amino acid side chains. The MAPSHB model, which is freely available on our web server, allows one to accurately and efficiently predict the presence of SHBs given a protein structure with moderate or low resolution and will facilitate the experimental and computational refinement of protein structures. 
    more » « less